Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, I have noticed, that some of our tooling has problems with histograms scraped from nginx.
The problem seems to be, that one worker processes collect() call, while another one is in the middle of updating histogram counters. This might result in situation, where the data contain higher count for some
le
values then forle="+Inf"
.Such inconsistency breaks all kinds of assumption that various tools might have. For us it has manifested by prometheus thinking that some counters produced by recorded rules were restarted (because
{le="+Inf"} - {le="0.1"}
was negative), which resulted in huge jumps in the metrics.The proposed fix is rather simple, just set the infinity value before other buckets (which are already incremented in descending order). It actually still allows to return inconsistent values, but at least the common assumption of non-decreasing bucket values can't be broken.
Proper fix would probably require locking, as there is AFAIK no way to atomically increment multiple values in the shared dict.